Overview

Brought to you by YData

Dataset statistics

Number of variables40
Number of observations59400
Missing cells46743
Missing cells (%)2.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory18.1 MiB
Average record size in memory320.0 B

Variable types

Numeric10
DateTime1
Text7
Categorical20
Boolean2

Alerts

recorded_by has constant value "GeoData Consultants Ltd" Constant
basin is highly overall correlated with construction_year and 3 other fieldsHigh correlation
construction_year is highly overall correlated with basin and 3 other fieldsHigh correlation
extraction_type is highly overall correlated with extraction_type_class and 3 other fieldsHigh correlation
extraction_type_class is highly overall correlated with extraction_type and 3 other fieldsHigh correlation
extraction_type_group is highly overall correlated with extraction_type and 3 other fieldsHigh correlation
gps_height is highly overall correlated with construction_year and 1 other fieldsHigh correlation
latitude is highly overall correlated with basin and 1 other fieldsHigh correlation
longitude is highly overall correlated with basin and 1 other fieldsHigh correlation
management is highly overall correlated with management_group and 1 other fieldsHigh correlation
management_group is highly overall correlated with management and 1 other fieldsHigh correlation
payment is highly overall correlated with payment_typeHigh correlation
payment_type is highly overall correlated with paymentHigh correlation
population is highly overall correlated with construction_year and 1 other fieldsHigh correlation
quality_group is highly overall correlated with water_qualityHigh correlation
quantity is highly overall correlated with quantity_groupHigh correlation
quantity_group is highly overall correlated with quantityHigh correlation
region is highly overall correlated with basin and 4 other fieldsHigh correlation
region_code is highly overall correlated with regionHigh correlation
scheme_management is highly overall correlated with management and 1 other fieldsHigh correlation
source is highly overall correlated with source_class and 1 other fieldsHigh correlation
source_class is highly overall correlated with source and 1 other fieldsHigh correlation
source_type is highly overall correlated with source and 1 other fieldsHigh correlation
water_quality is highly overall correlated with quality_groupHigh correlation
waterpoint_type is highly overall correlated with extraction_type and 3 other fieldsHigh correlation
waterpoint_type_group is highly overall correlated with extraction_type and 3 other fieldsHigh correlation
public_meeting is highly imbalanced (56.3%) Imbalance
management_group is highly imbalanced (69.3%) Imbalance
water_quality is highly imbalanced (71.3%) Imbalance
quality_group is highly imbalanced (68.0%) Imbalance
funder has 3637 (6.1%) missing values Missing
installer has 3655 (6.2%) missing values Missing
public_meeting has 3334 (5.6%) missing values Missing
scheme_management has 3878 (6.5%) missing values Missing
scheme_name has 28810 (48.5%) missing values Missing
permit has 3056 (5.1%) missing values Missing
amount_tsh is highly skewed (γ1 = 57.80779995) Skewed
num_private is highly skewed (γ1 = 91.93374999) Skewed
id is uniformly distributed Uniform
id has unique values Unique
amount_tsh has 41639 (70.1%) zeros Zeros
gps_height has 20438 (34.4%) zeros Zeros
longitude has 1812 (3.1%) zeros Zeros
num_private has 58643 (98.7%) zeros Zeros
population has 21381 (36.0%) zeros Zeros
construction_year has 20709 (34.9%) zeros Zeros

Reproduction

Analysis started2025-04-19 01:29:19.159288
Analysis finished2025-04-19 01:29:50.827046
Duration31.67 seconds
Software versionydata-profiling vv4.12.2
Download configurationconfig.json

Variables

id
Real number (ℝ)

Uniform  Unique 

Distinct59400
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37115.132
Minimum0
Maximum74247
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2025-04-18T21:29:51.019161image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3730.9
Q118519.75
median37061.5
Q355656.5
95-th percentile70564.05
Maximum74247
Range74247
Interquartile range (IQR)37136.75

Descriptive statistics

Standard deviation21453.128
Coefficient of variation (CV)0.57801569
Kurtosis-1.201515
Mean37115.132
Median Absolute Deviation (MAD)18568.5
Skewness0.0026225303
Sum2.2046388 × 109
Variance4.6023672 × 108
MonotonicityNot monotonic
2025-04-18T21:29:51.165023image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
69572 1
 
< 0.1%
27851 1
 
< 0.1%
6924 1
 
< 0.1%
61097 1
 
< 0.1%
48517 1
 
< 0.1%
62700 1
 
< 0.1%
48914 1
 
< 0.1%
479 1
 
< 0.1%
12824 1
 
< 0.1%
21909 1
 
< 0.1%
Other values (59390) 59390
> 99.9%
ValueCountFrequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
ValueCountFrequency (%)
74247 1
< 0.1%
74246 1
< 0.1%
74243 1
< 0.1%
74242 1
< 0.1%
74240 1
< 0.1%
74239 1
< 0.1%
74238 1
< 0.1%
74237 1
< 0.1%
74236 1
< 0.1%
74235 1
< 0.1%

amount_tsh
Real number (ℝ)

Skewed  Zeros 

Distinct98
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean317.65038
Minimum0
Maximum350000
Zeros41639
Zeros (%)70.1%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2025-04-18T21:29:51.383029image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile1200
Maximum350000
Range350000
Interquartile range (IQR)20

Descriptive statistics

Standard deviation2997.5746
Coefficient of variation (CV)9.43671
Kurtosis4903.5431
Mean317.65038
Median Absolute Deviation (MAD)0
Skewness57.8078
Sum18868433
Variance8985453.2
MonotonicityNot monotonic
2025-04-18T21:29:51.563827image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 41639
70.1%
500 3102
 
5.2%
50 2472
 
4.2%
1000 1488
 
2.5%
20 1463
 
2.5%
200 1220
 
2.1%
100 816
 
1.4%
10 806
 
1.4%
30 743
 
1.3%
2000 704
 
1.2%
Other values (88) 4947
 
8.3%
ValueCountFrequency (%)
0 41639
70.1%
0.2 3
 
< 0.1%
0.25 1
 
< 0.1%
1 3
 
< 0.1%
2 13
 
< 0.1%
5 376
 
0.6%
6 190
 
0.3%
7 69
 
0.1%
9 1
 
< 0.1%
10 806
 
1.4%
ValueCountFrequency (%)
350000 1
 
< 0.1%
250000 1
 
< 0.1%
200000 1
 
< 0.1%
170000 1
 
< 0.1%
138000 1
 
< 0.1%
120000 1
 
< 0.1%
117000 7
< 0.1%
100000 3
< 0.1%
70000 1
 
< 0.1%
60000 1
 
< 0.1%
Distinct356
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
Minimum2002-10-14 00:00:00
Maximum2013-12-03 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-04-18T21:29:51.782017image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:51.983565image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

funder
Text

Missing 

Distinct1896
Distinct (%)3.4%
Missing3637
Missing (%)6.1%
Memory size464.2 KiB
2025-04-18T21:29:52.291685image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Length

Max length30
Median length27
Mean length9.930115
Min length1

Characters and Unicode

Total characters553733
Distinct characters69
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique974 ?
Unique (%)1.7%

Sample

1st rowRoman
2nd rowGrumeti
3rd rowLottery Club
4th rowUnicef
5th rowAction In A
ValueCountFrequency (%)
of 9748
 
10.8%
government 9276
 
10.3%
tanzania 9172
 
10.1%
danida 3123
 
3.5%
world 2789
 
3.1%
water 2645
 
2.9%
hesawa 2203
 
2.4%
bank 1416
 
1.6%
rwssp 1376
 
1.5%
kkkt 1370
 
1.5%
Other values (2064) 47252
52.3%
2025-04-18T21:29:52.737428image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 68200
 
12.3%
n 57840
 
10.4%
i 38011
 
6.9%
e 37462
 
6.8%
34673
 
6.3%
r 27879
 
5.0%
t 23016
 
4.2%
o 22739
 
4.1%
s 17208
 
3.1%
d 15464
 
2.8%
Other values (59) 211241
38.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 553733
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 68200
 
12.3%
n 57840
 
10.4%
i 38011
 
6.9%
e 37462
 
6.8%
34673
 
6.3%
r 27879
 
5.0%
t 23016
 
4.2%
o 22739
 
4.1%
s 17208
 
3.1%
d 15464
 
2.8%
Other values (59) 211241
38.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 553733
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 68200
 
12.3%
n 57840
 
10.4%
i 38011
 
6.9%
e 37462
 
6.8%
34673
 
6.3%
r 27879
 
5.0%
t 23016
 
4.2%
o 22739
 
4.1%
s 17208
 
3.1%
d 15464
 
2.8%
Other values (59) 211241
38.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 553733
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 68200
 
12.3%
n 57840
 
10.4%
i 38011
 
6.9%
e 37462
 
6.8%
34673
 
6.3%
r 27879
 
5.0%
t 23016
 
4.2%
o 22739
 
4.1%
s 17208
 
3.1%
d 15464
 
2.8%
Other values (59) 211241
38.1%

gps_height
Real number (ℝ)

High correlation  Zeros 

Distinct2428
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean668.29724
Minimum-90
Maximum2770
Zeros20438
Zeros (%)34.4%
Negative1496
Negative (%)2.5%
Memory size464.2 KiB
2025-04-18T21:29:52.956882image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Quantile statistics

Minimum-90
5-th percentile0
Q10
median369
Q31319.25
95-th percentile1797
Maximum2770
Range2860
Interquartile range (IQR)1319.25

Descriptive statistics

Standard deviation693.11635
Coefficient of variation (CV)1.0371378
Kurtosis-1.2924401
Mean668.29724
Median Absolute Deviation (MAD)369
Skewness0.46240208
Sum39696856
Variance480410.28
MonotonicityNot monotonic
2025-04-18T21:29:53.218529image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 20438
34.4%
-15 60
 
0.1%
-16 55
 
0.1%
-13 55
 
0.1%
1290 52
 
0.1%
-20 52
 
0.1%
-14 51
 
0.1%
303 51
 
0.1%
-18 49
 
0.1%
-19 47
 
0.1%
Other values (2418) 38490
64.8%
ValueCountFrequency (%)
-90 1
 
< 0.1%
-63 2
 
< 0.1%
-59 1
 
< 0.1%
-57 1
 
< 0.1%
-55 1
 
< 0.1%
-54 1
 
< 0.1%
-53 1
 
< 0.1%
-52 2
 
< 0.1%
-51 2
 
< 0.1%
-50 5
< 0.1%
ValueCountFrequency (%)
2770 1
< 0.1%
2628 1
< 0.1%
2627 1
< 0.1%
2626 2
< 0.1%
2623 1
< 0.1%
2614 1
< 0.1%
2585 1
< 0.1%
2576 1
< 0.1%
2569 1
< 0.1%
2568 1
< 0.1%

installer
Text

Missing 

Distinct2145
Distinct (%)3.8%
Missing3655
Missing (%)6.2%
Memory size464.2 KiB
2025-04-18T21:29:54.026401image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Length

Max length30
Median length29
Mean length6.1112028
Min length1

Characters and Unicode

Total characters340669
Distinct characters70
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1098 ?
Unique (%)2.0%

Sample

1st rowRoman
2nd rowGRUMETI
3rd rowWorld vision
4th rowUNICEF
5th rowArtisan
ValueCountFrequency (%)
dwe 17601
25.8%
government 2778
 
4.1%
water 1881
 
2.8%
hesawa 1395
 
2.0%
rwe 1230
 
1.8%
district 1216
 
1.8%
kkkt 1153
 
1.7%
council 1106
 
1.6%
commu 1065
 
1.6%
danida 1051
 
1.5%
Other values (1976) 37806
55.4%
2025-04-18T21:29:54.688450image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
D 27595
 
8.1%
W 25849
 
7.6%
E 25389
 
7.5%
a 17343
 
5.1%
n 16558
 
4.9%
e 15500
 
4.5%
i 15053
 
4.4%
A 13668
 
4.0%
r 13377
 
3.9%
t 12904
 
3.8%
Other values (60) 157433
46.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 340669
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
D 27595
 
8.1%
W 25849
 
7.6%
E 25389
 
7.5%
a 17343
 
5.1%
n 16558
 
4.9%
e 15500
 
4.5%
i 15053
 
4.4%
A 13668
 
4.0%
r 13377
 
3.9%
t 12904
 
3.8%
Other values (60) 157433
46.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 340669
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
D 27595
 
8.1%
W 25849
 
7.6%
E 25389
 
7.5%
a 17343
 
5.1%
n 16558
 
4.9%
e 15500
 
4.5%
i 15053
 
4.4%
A 13668
 
4.0%
r 13377
 
3.9%
t 12904
 
3.8%
Other values (60) 157433
46.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 340669
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
D 27595
 
8.1%
W 25849
 
7.6%
E 25389
 
7.5%
a 17343
 
5.1%
n 16558
 
4.9%
e 15500
 
4.5%
i 15053
 
4.4%
A 13668
 
4.0%
r 13377
 
3.9%
t 12904
 
3.8%
Other values (60) 157433
46.2%

longitude
Real number (ℝ)

High correlation  Zeros 

Distinct57516
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.077427
Minimum0
Maximum40.345193
Zeros1812
Zeros (%)3.1%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2025-04-18T21:29:54.851624image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile30.04066
Q133.090347
median34.908743
Q337.178387
95-th percentile39.13324
Maximum40.345193
Range40.345193
Interquartile range (IQR)4.0880392

Descriptive statistics

Standard deviation6.5674318
Coefficient of variation (CV)0.19272089
Kurtosis19.187031
Mean34.077427
Median Absolute Deviation (MAD)2.0325111
Skewness-4.1910465
Sum2024199.1
Variance43.131161
MonotonicityNot monotonic
2025-04-18T21:29:55.034170image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1812
 
3.1%
37.37571687 2
 
< 0.1%
38.34050134 2
 
< 0.1%
39.08618257 2
 
< 0.1%
33.00503158 2
 
< 0.1%
39.09178536 2
 
< 0.1%
32.98751118 2
 
< 0.1%
37.23632569 2
 
< 0.1%
39.08628657 2
 
< 0.1%
39.08596496 2
 
< 0.1%
Other values (57506) 57570
96.9%
ValueCountFrequency (%)
0 1812
3.1%
29.6071219 1
 
< 0.1%
29.60720109 1
 
< 0.1%
29.61032056 1
 
< 0.1%
29.61096482 1
 
< 0.1%
29.61194674 1
 
< 0.1%
29.61250689 1
 
< 0.1%
29.61276296 1
 
< 0.1%
29.61344309 1
 
< 0.1%
29.6168718 1
 
< 0.1%
ValueCountFrequency (%)
40.34519307 1
< 0.1%
40.34430089 1
< 0.1%
40.32523996 1
< 0.1%
40.32522643 1
< 0.1%
40.32340181 1
< 0.1%
40.32283237 1
< 0.1%
40.32280453 1
< 0.1%
40.3226251 1
< 0.1%
40.32216902 1
< 0.1%
40.32196593 1
< 0.1%

latitude
Real number (ℝ)

High correlation 

Distinct57517
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.7060327
Minimum-11.64944
Maximum-2 × 10-8
Zeros0
Zeros (%)0.0%
Negative59400
Negative (%)100.0%
Memory size464.2 KiB
2025-04-18T21:29:55.208333image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Quantile statistics

Minimum-11.64944
5-th percentile-10.58555
Q1-8.5406213
median-5.0215966
Q3-3.3261556
95-th percentile-1.4088722
Maximum-2 × 10-8
Range11.64944
Interquartile range (IQR)5.2144657

Descriptive statistics

Standard deviation2.9460191
Coefficient of variation (CV)-0.51629902
Kurtosis-1.0576167
Mean-5.7060327
Median Absolute Deviation (MAD)2.0700299
Skewness-0.15203657
Sum-338938.34
Variance8.6790284
MonotonicityNot monotonic
2025-04-18T21:29:55.380971image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-2 × 10-81812
 
3.1%
-6.98584173 2
 
< 0.1%
-6.9802204 2
 
< 0.1%
-2.47667983 2
 
< 0.1%
-6.97826294 2
 
< 0.1%
-7.07808103 2
 
< 0.1%
-2.46524583 2
 
< 0.1%
-2.4943533 2
 
< 0.1%
-7.1772029 2
 
< 0.1%
-2.51532072 2
 
< 0.1%
Other values (57507) 57570
96.9%
ValueCountFrequency (%)
-11.64944018 1
< 0.1%
-11.64837759 1
< 0.1%
-11.58629656 1
< 0.1%
-11.56857679 1
< 0.1%
-11.56680457 1
< 0.1%
-11.56450865 1
< 0.1%
-11.56432357 1
< 0.1%
-11.56231592 1
< 0.1%
-11.56228898 1
< 0.1%
-11.56161898 1
< 0.1%
ValueCountFrequency (%)
-2 × 10-81812
3.1%
-0.99846435 1
 
< 0.1%
-0.998916 1
 
< 0.1%
-0.99901209 1
 
< 0.1%
-0.99911702 1
 
< 0.1%
-0.9994692 1
 
< 0.1%
-0.99950651 1
 
< 0.1%
-0.99952232 1
 
< 0.1%
-1.00058519 1
 
< 0.1%
-1.0015208 1
 
< 0.1%
Distinct37399
Distinct (%)63.0%
Missing2
Missing (%)< 0.1%
Memory size464.2 KiB
2025-04-18T21:29:55.897582image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Length

Max length30
Median length25
Mean length10.962339
Min length1

Characters and Unicode

Total characters651141
Distinct characters75
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32928 ?
Unique (%)55.4%

Sample

1st rownone
2nd rowZahanati
3rd rowKwa Mahundi
4th rowZahanati Ya Nanyumbu
5th rowShuleni
ValueCountFrequency (%)
kwa 21384
 
19.6%
none 3563
 
3.3%
mzee 3385
 
3.1%
shuleni 2123
 
1.9%
ya 1499
 
1.4%
shule 1389
 
1.3%
school 1113
 
1.0%
primary 1052
 
1.0%
zahanati 983
 
0.9%
msingi 870
 
0.8%
Other values (29461) 71931
65.8%
2025-04-18T21:29:56.475606image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 98806
15.2%
i 52404
 
8.0%
49898
 
7.7%
n 42146
 
6.5%
e 40983
 
6.3%
w 31669
 
4.9%
K 31385
 
4.8%
o 30245
 
4.6%
u 24217
 
3.7%
M 22040
 
3.4%
Other values (65) 227348
34.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 651141
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 98806
15.2%
i 52404
 
8.0%
49898
 
7.7%
n 42146
 
6.5%
e 40983
 
6.3%
w 31669
 
4.9%
K 31385
 
4.8%
o 30245
 
4.6%
u 24217
 
3.7%
M 22040
 
3.4%
Other values (65) 227348
34.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 651141
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 98806
15.2%
i 52404
 
8.0%
49898
 
7.7%
n 42146
 
6.5%
e 40983
 
6.3%
w 31669
 
4.9%
K 31385
 
4.8%
o 30245
 
4.6%
u 24217
 
3.7%
M 22040
 
3.4%
Other values (65) 227348
34.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 651141
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 98806
15.2%
i 52404
 
8.0%
49898
 
7.7%
n 42146
 
6.5%
e 40983
 
6.3%
w 31669
 
4.9%
K 31385
 
4.8%
o 30245
 
4.6%
u 24217
 
3.7%
M 22040
 
3.4%
Other values (65) 227348
34.9%

num_private
Real number (ℝ)

Skewed  Zeros 

Distinct65
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.47414141
Minimum0
Maximum1776
Zeros58643
Zeros (%)98.7%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2025-04-18T21:29:56.681883image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1776
Range1776
Interquartile range (IQR)0

Descriptive statistics

Standard deviation12.23623
Coefficient of variation (CV)25.807131
Kurtosis11137.295
Mean0.47414141
Median Absolute Deviation (MAD)0
Skewness91.93375
Sum28164
Variance149.72532
MonotonicityNot monotonic
2025-04-18T21:29:56.875958image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 58643
98.7%
6 81
 
0.1%
1 73
 
0.1%
5 46
 
0.1%
8 46
 
0.1%
32 40
 
0.1%
45 36
 
0.1%
15 35
 
0.1%
39 30
 
0.1%
93 28
 
< 0.1%
Other values (55) 342
 
0.6%
ValueCountFrequency (%)
0 58643
98.7%
1 73
 
0.1%
2 23
 
< 0.1%
3 27
 
< 0.1%
4 20
 
< 0.1%
5 46
 
0.1%
6 81
 
0.1%
7 26
 
< 0.1%
8 46
 
0.1%
9 4
 
< 0.1%
ValueCountFrequency (%)
1776 1
< 0.1%
1402 1
< 0.1%
755 1
< 0.1%
698 1
< 0.1%
672 1
< 0.1%
668 1
< 0.1%
450 1
< 0.1%
300 1
< 0.1%
280 1
< 0.1%
240 1
< 0.1%

basin
Categorical

High correlation 

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
Lake Victoria
10248 
Pangani
8940 
Rufiji
7976 
Internal
7785 
Lake Tanganyika
6432 
Other values (4)
18019 

Length

Max length23
Median length11
Mean length10.892357
Min length6

Characters and Unicode

Total characters647006
Distinct characters32
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLake Nyasa
2nd rowLake Victoria
3rd rowPangani
4th rowRuvuma / Southern Coast
5th rowLake Victoria

Common Values

ValueCountFrequency (%)
Lake Victoria 10248
17.3%
Pangani 8940
15.1%
Rufiji 7976
13.4%
Internal 7785
13.1%
Lake Tanganyika 6432
10.8%
Wami / Ruvu 5987
10.1%
Lake Nyasa 5085
8.6%
Ruvuma / Southern Coast 4493
7.6%
Lake Rukwa 2454
 
4.1%

Length

2025-04-18T21:29:57.052709image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:29:57.252157image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
lake 24219
22.2%
10480
9.6%
victoria 10248
9.4%
pangani 8940
 
8.2%
rufiji 7976
 
7.3%
internal 7785
 
7.1%
tanganyika 6432
 
5.9%
wami 5987
 
5.5%
ruvu 5987
 
5.5%
nyasa 5085
 
4.7%
Other values (4) 15933
14.6%

Most occurring characters

ValueCountFrequency (%)
a 107025
16.5%
i 57807
 
8.9%
n 50807
 
7.9%
49672
 
7.7%
e 36497
 
5.6%
u 35883
 
5.5%
k 33105
 
5.1%
t 27019
 
4.2%
L 24219
 
3.7%
r 22526
 
3.5%
Other values (22) 202446
31.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 647006
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 107025
16.5%
i 57807
 
8.9%
n 50807
 
7.9%
49672
 
7.7%
e 36497
 
5.6%
u 35883
 
5.5%
k 33105
 
5.1%
t 27019
 
4.2%
L 24219
 
3.7%
r 22526
 
3.5%
Other values (22) 202446
31.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 647006
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 107025
16.5%
i 57807
 
8.9%
n 50807
 
7.9%
49672
 
7.7%
e 36497
 
5.6%
u 35883
 
5.5%
k 33105
 
5.1%
t 27019
 
4.2%
L 24219
 
3.7%
r 22526
 
3.5%
Other values (22) 202446
31.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 647006
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 107025
16.5%
i 57807
 
8.9%
n 50807
 
7.9%
49672
 
7.7%
e 36497
 
5.6%
u 35883
 
5.5%
k 33105
 
5.1%
t 27019
 
4.2%
L 24219
 
3.7%
r 22526
 
3.5%
Other values (22) 202446
31.3%
Distinct19287
Distinct (%)32.7%
Missing371
Missing (%)0.6%
Memory size464.2 KiB
2025-04-18T21:29:57.731275image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Length

Max length30
Median length27
Mean length7.8975927
Min length1

Characters and Unicode

Total characters466187
Distinct characters73
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9424 ?
Unique (%)16.0%

Sample

1st rowMnyusi B
2nd rowNyamara
3rd rowMajengo
4th rowMahakamani
5th rowKyanyamisa
ValueCountFrequency (%)
a 2387
 
3.4%
b 2043
 
2.9%
kati 1902
 
2.7%
majengo 610
 
0.9%
wa 600
 
0.8%
shuleni 593
 
0.8%
madukani 569
 
0.8%
mtaa 514
 
0.7%
juu 403
 
0.6%
mjini 378
 
0.5%
Other values (17024) 60795
85.9%
2025-04-18T21:29:58.436052image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 72003
15.4%
i 45666
 
9.8%
n 33499
 
7.2%
u 26424
 
5.7%
e 25671
 
5.5%
o 23556
 
5.1%
M 20431
 
4.4%
g 18951
 
4.1%
l 16372
 
3.5%
m 15053
 
3.2%
Other values (63) 168561
36.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 466187
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 72003
15.4%
i 45666
 
9.8%
n 33499
 
7.2%
u 26424
 
5.7%
e 25671
 
5.5%
o 23556
 
5.1%
M 20431
 
4.4%
g 18951
 
4.1%
l 16372
 
3.5%
m 15053
 
3.2%
Other values (63) 168561
36.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 466187
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 72003
15.4%
i 45666
 
9.8%
n 33499
 
7.2%
u 26424
 
5.7%
e 25671
 
5.5%
o 23556
 
5.1%
M 20431
 
4.4%
g 18951
 
4.1%
l 16372
 
3.5%
m 15053
 
3.2%
Other values (63) 168561
36.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 466187
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 72003
15.4%
i 45666
 
9.8%
n 33499
 
7.2%
u 26424
 
5.7%
e 25671
 
5.5%
o 23556
 
5.1%
M 20431
 
4.4%
g 18951
 
4.1%
l 16372
 
3.5%
m 15053
 
3.2%
Other values (63) 168561
36.2%

region
Categorical

High correlation 

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
Iringa
5294 
Shinyanga
4982 
Mbeya
4639 
Kilimanjaro
4379 
Morogoro
4006 
Other values (16)
36100 

Length

Max length13
Median length11
Mean length6.6237542
Min length4

Characters and Unicode

Total characters393451
Distinct characters32
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIringa
2nd rowMara
3rd rowManyara
4th rowMtwara
5th rowKagera

Common Values

ValueCountFrequency (%)
Iringa 5294
 
8.9%
Shinyanga 4982
 
8.4%
Mbeya 4639
 
7.8%
Kilimanjaro 4379
 
7.4%
Morogoro 4006
 
6.7%
Arusha 3350
 
5.6%
Kagera 3316
 
5.6%
Mwanza 3102
 
5.2%
Kigoma 2816
 
4.7%
Ruvuma 2640
 
4.4%
Other values (11) 20876
35.1%

Length

2025-04-18T21:29:58.612694image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
iringa 5294
 
8.7%
shinyanga 4982
 
8.2%
mbeya 4639
 
7.6%
kilimanjaro 4379
 
7.2%
morogoro 4006
 
6.6%
arusha 3350
 
5.5%
kagera 3316
 
5.4%
mwanza 3102
 
5.1%
kigoma 2816
 
4.6%
ruvuma 2640
 
4.3%
Other values (13) 22486
36.9%

Most occurring characters

ValueCountFrequency (%)
a 83413
21.2%
n 33143
 
8.4%
r 32397
 
8.2%
i 31763
 
8.1%
o 29580
 
7.5%
g 25054
 
6.4%
M 17029
 
4.3%
m 12841
 
3.3%
y 11204
 
2.8%
K 10511
 
2.7%
Other values (22) 106516
27.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 393451
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 83413
21.2%
n 33143
 
8.4%
r 32397
 
8.2%
i 31763
 
8.1%
o 29580
 
7.5%
g 25054
 
6.4%
M 17029
 
4.3%
m 12841
 
3.3%
y 11204
 
2.8%
K 10511
 
2.7%
Other values (22) 106516
27.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 393451
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 83413
21.2%
n 33143
 
8.4%
r 32397
 
8.2%
i 31763
 
8.1%
o 29580
 
7.5%
g 25054
 
6.4%
M 17029
 
4.3%
m 12841
 
3.3%
y 11204
 
2.8%
K 10511
 
2.7%
Other values (22) 106516
27.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 393451
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 83413
21.2%
n 33143
 
8.4%
r 32397
 
8.2%
i 31763
 
8.1%
o 29580
 
7.5%
g 25054
 
6.4%
M 17029
 
4.3%
m 12841
 
3.3%
y 11204
 
2.8%
K 10511
 
2.7%
Other values (22) 106516
27.1%

region_code
Real number (ℝ)

High correlation 

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.297003
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2025-04-18T21:29:58.764506image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q15
median12
Q317
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)12

Descriptive statistics

Standard deviation17.587406
Coefficient of variation (CV)1.1497289
Kurtosis10.288433
Mean15.297003
Median Absolute Deviation (MAD)6
Skewness3.1738181
Sum908642
Variance309.31686
MonotonicityNot monotonic
2025-04-18T21:29:58.915472image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
11 5300
 
8.9%
17 5011
 
8.4%
12 4639
 
7.8%
3 4379
 
7.4%
5 4040
 
6.8%
18 3324
 
5.6%
19 3047
 
5.1%
2 3024
 
5.1%
16 2816
 
4.7%
10 2640
 
4.4%
Other values (17) 21180
35.7%
ValueCountFrequency (%)
1 2201
3.7%
2 3024
5.1%
3 4379
7.4%
4 2513
4.2%
5 4040
6.8%
6 1609
 
2.7%
7 805
 
1.4%
8 300
 
0.5%
9 390
 
0.7%
10 2640
4.4%
ValueCountFrequency (%)
99 423
 
0.7%
90 917
 
1.5%
80 1238
 
2.1%
60 1025
 
1.7%
40 1
 
< 0.1%
24 326
 
0.5%
21 1583
2.7%
20 1969
3.3%
19 3047
5.1%
18 3324
5.6%

district_code
Real number (ℝ)

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.6297475
Minimum0
Maximum80
Zeros23
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2025-04-18T21:29:59.082864image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q35
95-th percentile30
Maximum80
Range80
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.6336486
Coefficient of variation (CV)1.7112044
Kurtosis16.214284
Mean5.6297475
Median Absolute Deviation (MAD)1
Skewness3.9620453
Sum334407
Variance92.807186
MonotonicityNot monotonic
2025-04-18T21:29:59.252627image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
1 12203
20.5%
2 11173
18.8%
3 9998
16.8%
4 8999
15.1%
5 4356
 
7.3%
6 4074
 
6.9%
7 3343
 
5.6%
8 1043
 
1.8%
30 995
 
1.7%
33 874
 
1.5%
Other values (10) 2342
 
3.9%
ValueCountFrequency (%)
0 23
 
< 0.1%
1 12203
20.5%
2 11173
18.8%
3 9998
16.8%
4 8999
15.1%
5 4356
 
7.3%
6 4074
 
6.9%
7 3343
 
5.6%
8 1043
 
1.8%
13 391
 
0.7%
ValueCountFrequency (%)
80 12
 
< 0.1%
67 6
 
< 0.1%
63 195
 
0.3%
62 109
 
0.2%
60 63
 
0.1%
53 745
1.3%
43 505
0.9%
33 874
1.5%
30 995
1.7%
23 293
 
0.5%

lga
Text

Distinct125
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
2025-04-18T21:29:59.626862image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Length

Max length16
Median length14
Mean length7.4168855
Min length3

Characters and Unicode

Total characters440563
Distinct characters41
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowLudewa
2nd rowSerengeti
3rd rowSimanjiro
4th rowNanyumbu
5th rowKaragwe
ValueCountFrequency (%)
rural 9552
 
13.5%
njombe 2503
 
3.5%
urban 1683
 
2.4%
moshi 1330
 
1.9%
arusha 1315
 
1.9%
bariadi 1177
 
1.7%
singida 1172
 
1.7%
rungwe 1106
 
1.6%
kilosa 1094
 
1.5%
kasulu 1047
 
1.5%
Other values (106) 48656
68.9%
2025-04-18T21:30:00.088974image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 69982
15.9%
o 30079
 
6.8%
i 29483
 
6.7%
u 28324
 
6.4%
r 26886
 
6.1%
e 22579
 
5.1%
n 22521
 
5.1%
l 19238
 
4.4%
g 18385
 
4.2%
M 16017
 
3.6%
Other values (31) 157069
35.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 440563
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 69982
15.9%
o 30079
 
6.8%
i 29483
 
6.7%
u 28324
 
6.4%
r 26886
 
6.1%
e 22579
 
5.1%
n 22521
 
5.1%
l 19238
 
4.4%
g 18385
 
4.2%
M 16017
 
3.6%
Other values (31) 157069
35.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 440563
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 69982
15.9%
o 30079
 
6.8%
i 29483
 
6.7%
u 28324
 
6.4%
r 26886
 
6.1%
e 22579
 
5.1%
n 22521
 
5.1%
l 19238
 
4.4%
g 18385
 
4.2%
M 16017
 
3.6%
Other values (31) 157069
35.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 440563
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 69982
15.9%
o 30079
 
6.8%
i 29483
 
6.7%
u 28324
 
6.4%
r 26886
 
6.1%
e 22579
 
5.1%
n 22521
 
5.1%
l 19238
 
4.4%
g 18385
 
4.2%
M 16017
 
3.6%
Other values (31) 157069
35.7%

ward
Text

Distinct2092
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
2025-04-18T21:30:00.530735image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Length

Max length23
Median length19
Mean length7.5058418
Min length3

Characters and Unicode

Total characters445847
Distinct characters54
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)0.1%

Sample

1st rowMundindi
2nd rowNatta
3rd rowNgorika
4th rowNanyumbu
5th rowNyakasimbi
ValueCountFrequency (%)
mashariki 580
 
0.9%
urban 540
 
0.8%
siha 434
 
0.7%
kusini 393
 
0.6%
magharibi 362
 
0.6%
igosi 307
 
0.5%
masama 303
 
0.5%
machame 293
 
0.5%
kati 270
 
0.4%
imalinyi 252
 
0.4%
Other values (2106) 61033
94.2%
2025-04-18T21:30:01.238284image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 69533
15.6%
i 40243
 
9.0%
n 29584
 
6.6%
u 27015
 
6.1%
o 26093
 
5.9%
e 23589
 
5.3%
g 21166
 
4.7%
M 18916
 
4.2%
m 16216
 
3.6%
l 15799
 
3.5%
Other values (44) 157693
35.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445847
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 69533
15.6%
i 40243
 
9.0%
n 29584
 
6.6%
u 27015
 
6.1%
o 26093
 
5.9%
e 23589
 
5.3%
g 21166
 
4.7%
M 18916
 
4.2%
m 16216
 
3.6%
l 15799
 
3.5%
Other values (44) 157693
35.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445847
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 69533
15.6%
i 40243
 
9.0%
n 29584
 
6.6%
u 27015
 
6.1%
o 26093
 
5.9%
e 23589
 
5.3%
g 21166
 
4.7%
M 18916
 
4.2%
m 16216
 
3.6%
l 15799
 
3.5%
Other values (44) 157693
35.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445847
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 69533
15.6%
i 40243
 
9.0%
n 29584
 
6.6%
u 27015
 
6.1%
o 26093
 
5.9%
e 23589
 
5.3%
g 21166
 
4.7%
M 18916
 
4.2%
m 16216
 
3.6%
l 15799
 
3.5%
Other values (44) 157693
35.4%

population
Real number (ℝ)

High correlation  Zeros 

Distinct1049
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179.90998
Minimum0
Maximum30500
Zeros21381
Zeros (%)36.0%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2025-04-18T21:30:01.420980image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median25
Q3215
95-th percentile680
Maximum30500
Range30500
Interquartile range (IQR)215

Descriptive statistics

Standard deviation471.48218
Coefficient of variation (CV)2.620656
Kurtosis402.28012
Mean179.90998
Median Absolute Deviation (MAD)25
Skewness12.660714
Sum10686653
Variance222295.44
MonotonicityNot monotonic
2025-04-18T21:30:01.615487image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 21381
36.0%
1 7025
 
11.8%
200 1940
 
3.3%
150 1892
 
3.2%
250 1681
 
2.8%
300 1476
 
2.5%
100 1146
 
1.9%
50 1139
 
1.9%
500 1009
 
1.7%
350 986
 
1.7%
Other values (1039) 19725
33.2%
ValueCountFrequency (%)
0 21381
36.0%
1 7025
 
11.8%
2 4
 
< 0.1%
3 4
 
< 0.1%
4 13
 
< 0.1%
5 44
 
0.1%
6 19
 
< 0.1%
7 3
 
< 0.1%
8 23
 
< 0.1%
9 11
 
< 0.1%
ValueCountFrequency (%)
30500 1
 
< 0.1%
15300 1
 
< 0.1%
11463 1
 
< 0.1%
10000 3
< 0.1%
9865 1
 
< 0.1%
9500 1
 
< 0.1%
9000 3
< 0.1%
8848 1
 
< 0.1%
8600 1
 
< 0.1%
8500 1
 
< 0.1%

public_meeting
Boolean

Imbalance  Missing 

Distinct2
Distinct (%)< 0.1%
Missing3334
Missing (%)5.6%
Memory size464.2 KiB
True
51011 
False
 
5055
(Missing)
 
3334
ValueCountFrequency (%)
True 51011
85.9%
False 5055
 
8.5%
(Missing) 3334
 
5.6%
2025-04-18T21:30:01.773769image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

recorded_by
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
GeoData Consultants Ltd
59400 

Length

Max length23
Median length23
Mean length23
Min length23

Characters and Unicode

Total characters1366200
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGeoData Consultants Ltd
2nd rowGeoData Consultants Ltd
3rd rowGeoData Consultants Ltd
4th rowGeoData Consultants Ltd
5th rowGeoData Consultants Ltd

Common Values

ValueCountFrequency (%)
GeoData Consultants Ltd 59400
100.0%

Length

2025-04-18T21:30:01.882326image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:02.093951image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
geodata 59400
33.3%
consultants 59400
33.3%
ltd 59400
33.3%

Most occurring characters

ValueCountFrequency (%)
t 237600
17.4%
a 178200
13.0%
o 118800
8.7%
118800
8.7%
n 118800
8.7%
s 118800
8.7%
G 59400
 
4.3%
e 59400
 
4.3%
D 59400
 
4.3%
C 59400
 
4.3%
Other values (4) 237600
17.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1366200
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 237600
17.4%
a 178200
13.0%
o 118800
8.7%
118800
8.7%
n 118800
8.7%
s 118800
8.7%
G 59400
 
4.3%
e 59400
 
4.3%
D 59400
 
4.3%
C 59400
 
4.3%
Other values (4) 237600
17.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1366200
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 237600
17.4%
a 178200
13.0%
o 118800
8.7%
118800
8.7%
n 118800
8.7%
s 118800
8.7%
G 59400
 
4.3%
e 59400
 
4.3%
D 59400
 
4.3%
C 59400
 
4.3%
Other values (4) 237600
17.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1366200
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 237600
17.4%
a 178200
13.0%
o 118800
8.7%
118800
8.7%
n 118800
8.7%
s 118800
8.7%
G 59400
 
4.3%
e 59400
 
4.3%
D 59400
 
4.3%
C 59400
 
4.3%
Other values (4) 237600
17.4%

scheme_management
Categorical

High correlation  Missing 

Distinct11
Distinct (%)< 0.1%
Missing3878
Missing (%)6.5%
Memory size464.2 KiB
VWC
36793 
WUG
5206 
Water authority
 
3153
WUA
 
2883
Water Board
 
2748
Other values (6)
4739 

Length

Max length16
Median length3
Mean length4.6447354
Min length3

Characters and Unicode

Total characters257885
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowVWC
2nd rowOther
3rd rowVWC
4th rowVWC
5th rowVWC

Common Values

ValueCountFrequency (%)
VWC 36793
61.9%
WUG 5206
 
8.8%
Water authority 3153
 
5.3%
WUA 2883
 
4.9%
Water Board 2748
 
4.6%
Parastatal 1680
 
2.8%
Private operator 1063
 
1.8%
Company 1061
 
1.8%
Other 766
 
1.3%
SWC 97
 
0.2%
(Missing) 3878
 
6.5%

Length

2025-04-18T21:30:02.198021image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vwc 36793
58.9%
water 5901
 
9.4%
wug 5206
 
8.3%
authority 3153
 
5.0%
wua 2883
 
4.6%
board 2748
 
4.4%
parastatal 1680
 
2.7%
private 1063
 
1.7%
operator 1063
 
1.7%
company 1061
 
1.7%
Other values (3) 935
 
1.5%

Most occurring characters

ValueCountFrequency (%)
W 50880
19.7%
C 37951
14.7%
V 36793
14.3%
a 21709
8.4%
t 18531
 
7.2%
r 17509
 
6.8%
o 9088
 
3.5%
e 8793
 
3.4%
U 8089
 
3.1%
6964
 
2.7%
Other values (18) 41578
16.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 257885
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
W 50880
19.7%
C 37951
14.7%
V 36793
14.3%
a 21709
8.4%
t 18531
 
7.2%
r 17509
 
6.8%
o 9088
 
3.5%
e 8793
 
3.4%
U 8089
 
3.1%
6964
 
2.7%
Other values (18) 41578
16.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 257885
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
W 50880
19.7%
C 37951
14.7%
V 36793
14.3%
a 21709
8.4%
t 18531
 
7.2%
r 17509
 
6.8%
o 9088
 
3.5%
e 8793
 
3.4%
U 8089
 
3.1%
6964
 
2.7%
Other values (18) 41578
16.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 257885
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
W 50880
19.7%
C 37951
14.7%
V 36793
14.3%
a 21709
8.4%
t 18531
 
7.2%
r 17509
 
6.8%
o 9088
 
3.5%
e 8793
 
3.4%
U 8089
 
3.1%
6964
 
2.7%
Other values (18) 41578
16.1%

scheme_name
Text

Missing 

Distinct2695
Distinct (%)8.8%
Missing28810
Missing (%)48.5%
Memory size464.2 KiB
2025-04-18T21:30:02.615809image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Length

Max length46
Median length37
Mean length14.522164
Min length1

Characters and Unicode

Total characters444233
Distinct characters68
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique712 ?
Unique (%)2.3%

Sample

1st rowRoman
2nd rowNyumba ya mungu pipe scheme
3rd rowZingibali
4th rowBL Bondeni
5th rowwanging'ombe water supply s
ValueCountFrequency (%)
water 9770
 
13.7%
supply 6745
 
9.5%
scheme 2532
 
3.5%
wa 2157
 
3.0%
gravity 1914
 
2.7%
pipe 1346
 
1.9%
maji 1343
 
1.9%
mradi 1097
 
1.5%
line 1016
 
1.4%
supplied 877
 
1.2%
Other values (2506) 42575
59.7%
2025-04-18T21:30:03.468487image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 48584
 
10.9%
41252
 
9.3%
e 34595
 
7.8%
i 26411
 
5.9%
p 22451
 
5.1%
r 21816
 
4.9%
t 19216
 
4.3%
u 18441
 
4.2%
l 17308
 
3.9%
n 17116
 
3.9%
Other values (58) 177043
39.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 444233
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 48584
 
10.9%
41252
 
9.3%
e 34595
 
7.8%
i 26411
 
5.9%
p 22451
 
5.1%
r 21816
 
4.9%
t 19216
 
4.3%
u 18441
 
4.2%
l 17308
 
3.9%
n 17116
 
3.9%
Other values (58) 177043
39.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 444233
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 48584
 
10.9%
41252
 
9.3%
e 34595
 
7.8%
i 26411
 
5.9%
p 22451
 
5.1%
r 21816
 
4.9%
t 19216
 
4.3%
u 18441
 
4.2%
l 17308
 
3.9%
n 17116
 
3.9%
Other values (58) 177043
39.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 444233
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 48584
 
10.9%
41252
 
9.3%
e 34595
 
7.8%
i 26411
 
5.9%
p 22451
 
5.1%
r 21816
 
4.9%
t 19216
 
4.3%
u 18441
 
4.2%
l 17308
 
3.9%
n 17116
 
3.9%
Other values (58) 177043
39.9%

permit
Boolean

Missing 

Distinct2
Distinct (%)< 0.1%
Missing3056
Missing (%)5.1%
Memory size464.2 KiB
True
38852 
False
17492 
(Missing)
 
3056
ValueCountFrequency (%)
True 38852
65.4%
False 17492
29.4%
(Missing) 3056
 
5.1%
2025-04-18T21:30:03.609722image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

construction_year
Real number (ℝ)

High correlation  Zeros 

Distinct55
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1300.6525
Minimum0
Maximum2013
Zeros20709
Zeros (%)34.9%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2025-04-18T21:30:03.750840image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1986
Q32004
95-th percentile2010
Maximum2013
Range2013
Interquartile range (IQR)2004

Descriptive statistics

Standard deviation951.62055
Coefficient of variation (CV)0.73164859
Kurtosis-1.5964324
Mean1300.6525
Median Absolute Deviation (MAD)22
Skewness-0.63492779
Sum77258757
Variance905581.67
MonotonicityNot monotonic
2025-04-18T21:30:03.933995image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 20709
34.9%
2010 2645
 
4.5%
2008 2613
 
4.4%
2009 2533
 
4.3%
2000 2091
 
3.5%
2007 1587
 
2.7%
2006 1471
 
2.5%
2003 1286
 
2.2%
2011 1256
 
2.1%
2004 1123
 
1.9%
Other values (45) 22086
37.2%
ValueCountFrequency (%)
0 20709
34.9%
1960 102
 
0.2%
1961 21
 
< 0.1%
1962 30
 
0.1%
1963 85
 
0.1%
1964 40
 
0.1%
1965 19
 
< 0.1%
1966 17
 
< 0.1%
1967 88
 
0.1%
1968 77
 
0.1%
ValueCountFrequency (%)
2013 176
 
0.3%
2012 1084
1.8%
2011 1256
2.1%
2010 2645
4.5%
2009 2533
4.3%
2008 2613
4.4%
2007 1587
2.7%
2006 1471
2.5%
2005 1011
 
1.7%
2004 1123
1.9%

extraction_type
Categorical

High correlation 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
gravity
26780 
nira/tanira
8154 
other
6430 
submersible
4764 
swn 80
3670 
Other values (13)
9602 

Length

Max length25
Median length17
Mean length7.7195118
Min length3

Characters and Unicode

Total characters458539
Distinct characters29
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity 26780
45.1%
nira/tanira 8154
 
13.7%
other 6430
 
10.8%
submersible 4764
 
8.0%
swn 80 3670
 
6.2%
mono 2865
 
4.8%
india mark ii 2400
 
4.0%
afridev 1770
 
3.0%
ksb 1415
 
2.4%
other - rope pump 451
 
0.8%
Other values (8) 701
 
1.2%

Length

2025-04-18T21:30:04.106743image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gravity 26780
38.1%
nira/tanira 8154
 
11.6%
other 7197
 
10.2%
submersible 4764
 
6.8%
swn 3899
 
5.5%
80 3670
 
5.2%
mono 2865
 
4.1%
india 2498
 
3.6%
mark 2498
 
3.6%
ii 2400
 
3.4%
Other values (13) 5640
 
8.0%

Most occurring characters

ValueCountFrequency (%)
i 60078
13.1%
r 59768
13.0%
a 58179
12.7%
t 42131
9.2%
v 28550
 
6.2%
y 26867
 
5.9%
g 26782
 
5.8%
n 25691
 
5.6%
e 19036
 
4.2%
s 14844
 
3.2%
Other values (19) 96613
21.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 458539
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 60078
13.1%
r 59768
13.0%
a 58179
12.7%
t 42131
9.2%
v 28550
 
6.2%
y 26867
 
5.9%
g 26782
 
5.8%
n 25691
 
5.6%
e 19036
 
4.2%
s 14844
 
3.2%
Other values (19) 96613
21.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 458539
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 60078
13.1%
r 59768
13.0%
a 58179
12.7%
t 42131
9.2%
v 28550
 
6.2%
y 26867
 
5.9%
g 26782
 
5.8%
n 25691
 
5.6%
e 19036
 
4.2%
s 14844
 
3.2%
Other values (19) 96613
21.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 458539
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 60078
13.1%
r 59768
13.0%
a 58179
12.7%
t 42131
9.2%
v 28550
 
6.2%
y 26867
 
5.9%
g 26782
 
5.8%
n 25691
 
5.6%
e 19036
 
4.2%
s 14844
 
3.2%
Other values (19) 96613
21.1%

extraction_type_group
Categorical

High correlation 

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
gravity
26780 
nira/tanira
8154 
other
6430 
submersible
6179 
swn 80
3670 
Other values (8)
8187 

Length

Max length15
Median length14
Mean length7.8805387
Min length4

Characters and Unicode

Total characters468104
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity 26780
45.1%
nira/tanira 8154
 
13.7%
other 6430
 
10.8%
submersible 6179
 
10.4%
swn 80 3670
 
6.2%
mono 2865
 
4.8%
india mark ii 2400
 
4.0%
afridev 1770
 
3.0%
rope pump 451
 
0.8%
other handpump 364
 
0.6%
Other values (3) 337
 
0.6%

Length

2025-04-18T21:30:04.265457image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gravity 26780
38.8%
nira/tanira 8154
 
11.8%
other 6916
 
10.0%
submersible 6179
 
9.0%
swn 3670
 
5.3%
80 3670
 
5.3%
mono 2865
 
4.2%
mark 2498
 
3.6%
india 2498
 
3.6%
ii 2400
 
3.5%
Other values (7) 3373
 
4.9%

Most occurring characters

ValueCountFrequency (%)
i 61244
13.1%
r 61141
13.1%
a 58372
12.5%
t 41972
9.0%
v 28550
 
6.1%
g 26780
 
5.7%
y 26780
 
5.7%
n 25822
 
5.5%
e 21729
 
4.6%
s 16028
 
3.4%
Other values (16) 99686
21.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 468104
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 61244
13.1%
r 61141
13.1%
a 58372
12.5%
t 41972
9.0%
v 28550
 
6.1%
g 26780
 
5.7%
y 26780
 
5.7%
n 25822
 
5.5%
e 21729
 
4.6%
s 16028
 
3.4%
Other values (16) 99686
21.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 468104
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 61244
13.1%
r 61141
13.1%
a 58372
12.5%
t 41972
9.0%
v 28550
 
6.1%
g 26780
 
5.7%
y 26780
 
5.7%
n 25822
 
5.5%
e 21729
 
4.6%
s 16028
 
3.4%
Other values (16) 99686
21.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 468104
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 61244
13.1%
r 61141
13.1%
a 58372
12.5%
t 41972
9.0%
v 28550
 
6.1%
g 26780
 
5.7%
y 26780
 
5.7%
n 25822
 
5.5%
e 21729
 
4.6%
s 16028
 
3.4%
Other values (16) 99686
21.3%

extraction_type_class
Categorical

High correlation 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
gravity
26780 
handpump
16456 
other
6430 
submersible
6179 
motorpump
2987 
Other values (2)
 
568

Length

Max length12
Median length11
Mean length7.6022391
Min length5

Characters and Unicode

Total characters451573
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity 26780
45.1%
handpump 16456
27.7%
other 6430
 
10.8%
submersible 6179
 
10.4%
motorpump 2987
 
5.0%
rope pump 451
 
0.8%
wind-powered 117
 
0.2%

Length

2025-04-18T21:30:04.421971image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:04.599448image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
gravity 26780
44.7%
handpump 16456
27.5%
other 6430
 
10.7%
submersible 6179
 
10.3%
motorpump 2987
 
5.0%
rope 451
 
0.8%
pump 451
 
0.8%
wind-powered 117
 
0.2%

Most occurring characters

ValueCountFrequency (%)
a 43236
 
9.6%
r 42944
 
9.5%
p 40356
 
8.9%
t 36197
 
8.0%
i 33076
 
7.3%
m 29060
 
6.4%
g 26780
 
5.9%
y 26780
 
5.9%
v 26780
 
5.9%
u 26073
 
5.8%
Other values (11) 120291
26.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 451573
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 43236
 
9.6%
r 42944
 
9.5%
p 40356
 
8.9%
t 36197
 
8.0%
i 33076
 
7.3%
m 29060
 
6.4%
g 26780
 
5.9%
y 26780
 
5.9%
v 26780
 
5.9%
u 26073
 
5.8%
Other values (11) 120291
26.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 451573
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 43236
 
9.6%
r 42944
 
9.5%
p 40356
 
8.9%
t 36197
 
8.0%
i 33076
 
7.3%
m 29060
 
6.4%
g 26780
 
5.9%
y 26780
 
5.9%
v 26780
 
5.9%
u 26073
 
5.8%
Other values (11) 120291
26.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 451573
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 43236
 
9.6%
r 42944
 
9.5%
p 40356
 
8.9%
t 36197
 
8.0%
i 33076
 
7.3%
m 29060
 
6.4%
g 26780
 
5.9%
y 26780
 
5.9%
v 26780
 
5.9%
u 26073
 
5.8%
Other values (11) 120291
26.6%

management
Categorical

High correlation 

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
vwc
40507 
wug
6515 
water board
 
2933
wua
 
2535
private operator
 
1971
Other values (7)
4939 

Length

Max length16
Median length3
Mean length4.3506397
Min length3

Characters and Unicode

Total characters258428
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowvwc
2nd rowwug
3rd rowvwc
4th rowvwc
5th rowother

Common Values

ValueCountFrequency (%)
vwc 40507
68.2%
wug 6515
 
11.0%
water board 2933
 
4.9%
wua 2535
 
4.3%
private operator 1971
 
3.3%
parastatal 1768
 
3.0%
water authority 904
 
1.5%
other 844
 
1.4%
company 685
 
1.2%
unknown 561
 
0.9%
Other values (2) 177
 
0.3%

Length

2025-04-18T21:30:04.816636image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vwc 40507
61.9%
wug 6515
 
10.0%
water 3837
 
5.9%
board 2933
 
4.5%
wua 2535
 
3.9%
private 1971
 
3.0%
operator 1971
 
3.0%
parastatal 1768
 
2.7%
other 943
 
1.4%
authority 904
 
1.4%
Other values (5) 1522
 
2.3%

Most occurring characters

ValueCountFrequency (%)
w 53955
20.9%
v 42478
16.4%
c 41291
16.0%
a 21908
8.5%
r 16376
 
6.3%
t 14222
 
5.5%
u 10593
 
4.1%
o 10166
 
3.9%
e 8722
 
3.4%
g 6515
 
2.5%
Other values (13) 32202
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 258428
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w 53955
20.9%
v 42478
16.4%
c 41291
16.0%
a 21908
8.5%
r 16376
 
6.3%
t 14222
 
5.5%
u 10593
 
4.1%
o 10166
 
3.9%
e 8722
 
3.4%
g 6515
 
2.5%
Other values (13) 32202
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 258428
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w 53955
20.9%
v 42478
16.4%
c 41291
16.0%
a 21908
8.5%
r 16376
 
6.3%
t 14222
 
5.5%
u 10593
 
4.1%
o 10166
 
3.9%
e 8722
 
3.4%
g 6515
 
2.5%
Other values (13) 32202
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 258428
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w 53955
20.9%
v 42478
16.4%
c 41291
16.0%
a 21908
8.5%
r 16376
 
6.3%
t 14222
 
5.5%
u 10593
 
4.1%
o 10166
 
3.9%
e 8722
 
3.4%
g 6515
 
2.5%
Other values (13) 32202
12.5%

management_group
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
user-group
52490 
commercial
 
3638
parastatal
 
1768
other
 
943
unknown
 
561

Length

Max length10
Median length10
Mean length9.8922896
Min length5

Characters and Unicode

Total characters587602
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowuser-group
2nd rowuser-group
3rd rowuser-group
4th rowuser-group
5th rowother

Common Values

ValueCountFrequency (%)
user-group 52490
88.4%
commercial 3638
 
6.1%
parastatal 1768
 
3.0%
other 943
 
1.6%
unknown 561
 
0.9%

Length

2025-04-18T21:30:04.969749image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:05.094288image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
user-group 52490
88.4%
commercial 3638
 
6.1%
parastatal 1768
 
3.0%
other 943
 
1.6%
unknown 561
 
0.9%

Most occurring characters

ValueCountFrequency (%)
r 111329
18.9%
u 105541
18.0%
o 57632
9.8%
e 57071
9.7%
s 54258
9.2%
p 54258
9.2%
- 52490
8.9%
g 52490
8.9%
a 10710
 
1.8%
m 7276
 
1.2%
Other values (8) 24547
 
4.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 587602
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 111329
18.9%
u 105541
18.0%
o 57632
9.8%
e 57071
9.7%
s 54258
9.2%
p 54258
9.2%
- 52490
8.9%
g 52490
8.9%
a 10710
 
1.8%
m 7276
 
1.2%
Other values (8) 24547
 
4.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 587602
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 111329
18.9%
u 105541
18.0%
o 57632
9.8%
e 57071
9.7%
s 54258
9.2%
p 54258
9.2%
- 52490
8.9%
g 52490
8.9%
a 10710
 
1.8%
m 7276
 
1.2%
Other values (8) 24547
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 587602
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 111329
18.9%
u 105541
18.0%
o 57632
9.8%
e 57071
9.7%
s 54258
9.2%
p 54258
9.2%
- 52490
8.9%
g 52490
8.9%
a 10710
 
1.8%
m 7276
 
1.2%
Other values (8) 24547
 
4.2%

payment
Categorical

High correlation 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
never pay
25348 
pay per bucket
8985 
pay monthly
8300 
unknown
8157 
pay when scheme fails
3914 
Other values (2)
4696 

Length

Max length21
Median length14
Mean length10.664798
Min length5

Characters and Unicode

Total characters633489
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpay annually
2nd rownever pay
3rd rowpay per bucket
4th rownever pay
5th rownever pay

Common Values

ValueCountFrequency (%)
never pay 25348
42.7%
pay per bucket 8985
 
15.1%
pay monthly 8300
 
14.0%
unknown 8157
 
13.7%
pay when scheme fails 3914
 
6.6%
pay annually 3642
 
6.1%
other 1054
 
1.8%

Length

2025-04-18T21:30:05.294398image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:05.841450image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
pay 50189
39.7%
never 25348
20.1%
per 8985
 
7.1%
bucket 8985
 
7.1%
monthly 8300
 
6.6%
unknown 8157
 
6.5%
when 3914
 
3.1%
scheme 3914
 
3.1%
fails 3914
 
3.1%
annually 3642
 
2.9%

Most occurring characters

ValueCountFrequency (%)
e 81462
12.9%
n 69317
10.9%
67002
10.6%
y 62131
9.8%
a 61387
9.7%
p 59174
9.3%
r 35387
 
5.6%
v 25348
 
4.0%
u 20784
 
3.3%
l 19498
 
3.1%
Other values (11) 131999
20.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 633489
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 81462
12.9%
n 69317
10.9%
67002
10.6%
y 62131
9.8%
a 61387
9.7%
p 59174
9.3%
r 35387
 
5.6%
v 25348
 
4.0%
u 20784
 
3.3%
l 19498
 
3.1%
Other values (11) 131999
20.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 633489
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 81462
12.9%
n 69317
10.9%
67002
10.6%
y 62131
9.8%
a 61387
9.7%
p 59174
9.3%
r 35387
 
5.6%
v 25348
 
4.0%
u 20784
 
3.3%
l 19498
 
3.1%
Other values (11) 131999
20.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 633489
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 81462
12.9%
n 69317
10.9%
67002
10.6%
y 62131
9.8%
a 61387
9.7%
p 59174
9.3%
r 35387
 
5.6%
v 25348
 
4.0%
u 20784
 
3.3%
l 19498
 
3.1%
Other values (11) 131999
20.8%

payment_type
Categorical

High correlation 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
never pay
25348 
per bucket
8985 
monthly
8300 
unknown
8157 
on failure
3914 
Other values (2)
4696 

Length

Max length10
Median length9
Mean length8.5307576
Min length5

Characters and Unicode

Total characters506727
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowannually
2nd rownever pay
3rd rowper bucket
4th rownever pay
5th rownever pay

Common Values

ValueCountFrequency (%)
never pay 25348
42.7%
per bucket 8985
 
15.1%
monthly 8300
 
14.0%
unknown 8157
 
13.7%
on failure 3914
 
6.6%
annually 3642
 
6.1%
other 1054
 
1.8%

Length

2025-04-18T21:30:05.979126image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:06.124705image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
never 25348
26.0%
pay 25348
26.0%
per 8985
 
9.2%
bucket 8985
 
9.2%
monthly 8300
 
8.5%
unknown 8157
 
8.4%
on 3914
 
4.0%
failure 3914
 
4.0%
annually 3642
 
3.7%
other 1054
 
1.1%

Most occurring characters

ValueCountFrequency (%)
e 73634
14.5%
n 69317
13.7%
r 39301
 
7.8%
38247
 
7.5%
y 37290
 
7.4%
a 36546
 
7.2%
p 34333
 
6.8%
v 25348
 
5.0%
u 24698
 
4.9%
o 21425
 
4.2%
Other values (10) 106588
21.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 506727
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 73634
14.5%
n 69317
13.7%
r 39301
 
7.8%
38247
 
7.5%
y 37290
 
7.4%
a 36546
 
7.2%
p 34333
 
6.8%
v 25348
 
5.0%
u 24698
 
4.9%
o 21425
 
4.2%
Other values (10) 106588
21.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 506727
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 73634
14.5%
n 69317
13.7%
r 39301
 
7.8%
38247
 
7.5%
y 37290
 
7.4%
a 36546
 
7.2%
p 34333
 
6.8%
v 25348
 
5.0%
u 24698
 
4.9%
o 21425
 
4.2%
Other values (10) 106588
21.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 506727
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 73634
14.5%
n 69317
13.7%
r 39301
 
7.8%
38247
 
7.5%
y 37290
 
7.4%
a 36546
 
7.2%
p 34333
 
6.8%
v 25348
 
5.0%
u 24698
 
4.9%
o 21425
 
4.2%
Other values (10) 106588
21.0%

water_quality
Categorical

High correlation  Imbalance 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
soft
50818 
salty
 
4856
unknown
 
1876
milky
 
804
coloured
 
490
Other values (3)
 
556

Length

Max length18
Median length4
Mean length4.3032828
Min length4

Characters and Unicode

Total characters255615
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsoft
2nd rowsoft
3rd rowsoft
4th rowsoft
5th rowsoft

Common Values

ValueCountFrequency (%)
soft 50818
85.6%
salty 4856
 
8.2%
unknown 1876
 
3.2%
milky 804
 
1.4%
coloured 490
 
0.8%
salty abandoned 339
 
0.6%
fluoride 200
 
0.3%
fluoride abandoned 17
 
< 0.1%

Length

2025-04-18T21:30:06.312342image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:06.520828image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
soft 50818
85.0%
salty 5195
 
8.7%
unknown 1876
 
3.1%
milky 804
 
1.3%
coloured 490
 
0.8%
abandoned 356
 
0.6%
fluoride 217
 
0.4%

Most occurring characters

ValueCountFrequency (%)
s 56013
21.9%
t 56013
21.9%
o 54247
21.2%
f 51035
20.0%
l 6706
 
2.6%
n 6340
 
2.5%
y 5999
 
2.3%
a 5907
 
2.3%
k 2680
 
1.0%
u 2583
 
1.0%
Other values (9) 8092
 
3.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 255615
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
s 56013
21.9%
t 56013
21.9%
o 54247
21.2%
f 51035
20.0%
l 6706
 
2.6%
n 6340
 
2.5%
y 5999
 
2.3%
a 5907
 
2.3%
k 2680
 
1.0%
u 2583
 
1.0%
Other values (9) 8092
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 255615
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
s 56013
21.9%
t 56013
21.9%
o 54247
21.2%
f 51035
20.0%
l 6706
 
2.6%
n 6340
 
2.5%
y 5999
 
2.3%
a 5907
 
2.3%
k 2680
 
1.0%
u 2583
 
1.0%
Other values (9) 8092
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 255615
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
s 56013
21.9%
t 56013
21.9%
o 54247
21.2%
f 51035
20.0%
l 6706
 
2.6%
n 6340
 
2.5%
y 5999
 
2.3%
a 5907
 
2.3%
k 2680
 
1.0%
u 2583
 
1.0%
Other values (9) 8092
 
3.2%

quality_group
Categorical

High correlation  Imbalance 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
good
50818 
salty
5195 
unknown
 
1876
milky
 
804
colored
 
490

Length

Max length8
Median length4
Mean length4.235101
Min length4

Characters and Unicode

Total characters251565
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgood
2nd rowgood
3rd rowgood
4th rowgood
5th rowgood

Common Values

ValueCountFrequency (%)
good 50818
85.6%
salty 5195
 
8.7%
unknown 1876
 
3.2%
milky 804
 
1.4%
colored 490
 
0.8%
fluoride 217
 
0.4%

Length

2025-04-18T21:30:06.721013image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:06.864729image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
good 50818
85.6%
salty 5195
 
8.7%
unknown 1876
 
3.2%
milky 804
 
1.4%
colored 490
 
0.8%
fluoride 217
 
0.4%

Most occurring characters

ValueCountFrequency (%)
o 104709
41.6%
d 51525
20.5%
g 50818
20.2%
l 6706
 
2.7%
y 5999
 
2.4%
n 5628
 
2.2%
t 5195
 
2.1%
a 5195
 
2.1%
s 5195
 
2.1%
k 2680
 
1.1%
Other values (8) 7915
 
3.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 251565
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 104709
41.6%
d 51525
20.5%
g 50818
20.2%
l 6706
 
2.7%
y 5999
 
2.4%
n 5628
 
2.2%
t 5195
 
2.1%
a 5195
 
2.1%
s 5195
 
2.1%
k 2680
 
1.1%
Other values (8) 7915
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 251565
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 104709
41.6%
d 51525
20.5%
g 50818
20.2%
l 6706
 
2.7%
y 5999
 
2.4%
n 5628
 
2.2%
t 5195
 
2.1%
a 5195
 
2.1%
s 5195
 
2.1%
k 2680
 
1.1%
Other values (8) 7915
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 251565
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 104709
41.6%
d 51525
20.5%
g 50818
20.2%
l 6706
 
2.7%
y 5999
 
2.4%
n 5628
 
2.2%
t 5195
 
2.1%
a 5195
 
2.1%
s 5195
 
2.1%
k 2680
 
1.1%
Other values (8) 7915
 
3.1%

quantity
Categorical

High correlation 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
enough
33186 
insufficient
15129 
dry
6246 
seasonal
4050 
unknown
 
789

Length

Max length12
Median length6
Mean length7.3623737
Min length3

Characters and Unicode

Total characters437325
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowenough
2nd rowinsufficient
3rd rowenough
4th rowdry
5th rowseasonal

Common Values

ValueCountFrequency (%)
enough 33186
55.9%
insufficient 15129
25.5%
dry 6246
 
10.5%
seasonal 4050
 
6.8%
unknown 789
 
1.3%

Length

2025-04-18T21:30:07.039506image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:07.236764image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
enough 33186
55.9%
insufficient 15129
25.5%
dry 6246
 
10.5%
seasonal 4050
 
6.8%
unknown 789
 
1.3%

Most occurring characters

ValueCountFrequency (%)
n 69861
16.0%
e 52365
12.0%
u 49104
11.2%
i 45387
10.4%
o 38025
8.7%
g 33186
7.6%
h 33186
7.6%
f 30258
6.9%
s 23229
 
5.3%
t 15129
 
3.5%
Other values (8) 47595
10.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 437325
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 69861
16.0%
e 52365
12.0%
u 49104
11.2%
i 45387
10.4%
o 38025
8.7%
g 33186
7.6%
h 33186
7.6%
f 30258
6.9%
s 23229
 
5.3%
t 15129
 
3.5%
Other values (8) 47595
10.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 437325
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 69861
16.0%
e 52365
12.0%
u 49104
11.2%
i 45387
10.4%
o 38025
8.7%
g 33186
7.6%
h 33186
7.6%
f 30258
6.9%
s 23229
 
5.3%
t 15129
 
3.5%
Other values (8) 47595
10.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 437325
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 69861
16.0%
e 52365
12.0%
u 49104
11.2%
i 45387
10.4%
o 38025
8.7%
g 33186
7.6%
h 33186
7.6%
f 30258
6.9%
s 23229
 
5.3%
t 15129
 
3.5%
Other values (8) 47595
10.9%

quantity_group
Categorical

High correlation 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
enough
33186 
insufficient
15129 
dry
6246 
seasonal
4050 
unknown
 
789

Length

Max length12
Median length6
Mean length7.3623737
Min length3

Characters and Unicode

Total characters437325
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowenough
2nd rowinsufficient
3rd rowenough
4th rowdry
5th rowseasonal

Common Values

ValueCountFrequency (%)
enough 33186
55.9%
insufficient 15129
25.5%
dry 6246
 
10.5%
seasonal 4050
 
6.8%
unknown 789
 
1.3%

Length

2025-04-18T21:30:07.399836image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:07.536399image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
enough 33186
55.9%
insufficient 15129
25.5%
dry 6246
 
10.5%
seasonal 4050
 
6.8%
unknown 789
 
1.3%

Most occurring characters

ValueCountFrequency (%)
n 69861
16.0%
e 52365
12.0%
u 49104
11.2%
i 45387
10.4%
o 38025
8.7%
g 33186
7.6%
h 33186
7.6%
f 30258
6.9%
s 23229
 
5.3%
t 15129
 
3.5%
Other values (8) 47595
10.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 437325
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 69861
16.0%
e 52365
12.0%
u 49104
11.2%
i 45387
10.4%
o 38025
8.7%
g 33186
7.6%
h 33186
7.6%
f 30258
6.9%
s 23229
 
5.3%
t 15129
 
3.5%
Other values (8) 47595
10.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 437325
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 69861
16.0%
e 52365
12.0%
u 49104
11.2%
i 45387
10.4%
o 38025
8.7%
g 33186
7.6%
h 33186
7.6%
f 30258
6.9%
s 23229
 
5.3%
t 15129
 
3.5%
Other values (8) 47595
10.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 437325
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 69861
16.0%
e 52365
12.0%
u 49104
11.2%
i 45387
10.4%
o 38025
8.7%
g 33186
7.6%
h 33186
7.6%
f 30258
6.9%
s 23229
 
5.3%
t 15129
 
3.5%
Other values (8) 47595
10.9%

source
Categorical

High correlation 

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
spring
17021 
shallow well
16824 
machine dbh
11075 
river
9612 
rainwater harvesting
2295 
Other values (5)
2573 

Length

Max length20
Median length12
Mean length8.9788047
Min length3

Characters and Unicode

Total characters533341
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowrainwater harvesting
3rd rowdam
4th rowmachine dbh
5th rowrainwater harvesting

Common Values

ValueCountFrequency (%)
spring 17021
28.7%
shallow well 16824
28.3%
machine dbh 11075
18.6%
river 9612
16.2%
rainwater harvesting 2295
 
3.9%
hand dtw 874
 
1.5%
lake 765
 
1.3%
dam 656
 
1.1%
other 212
 
0.4%
unknown 66
 
0.1%

Length

2025-04-18T21:30:07.711257image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:07.845660image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
spring 17021
18.8%
shallow 16824
18.6%
well 16824
18.6%
machine 11075
12.2%
dbh 11075
12.2%
river 9612
10.6%
rainwater 2295
 
2.5%
harvesting 2295
 
2.5%
hand 874
 
1.0%
dtw 874
 
1.0%
Other values (4) 1699
 
1.9%

Most occurring characters

ValueCountFrequency (%)
l 68061
12.8%
r 43342
 
8.1%
e 43078
 
8.1%
h 42355
 
7.9%
i 42298
 
7.9%
a 37079
 
7.0%
w 36883
 
6.9%
s 36140
 
6.8%
n 33758
 
6.3%
31068
 
5.8%
Other values (11) 119279
22.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 533341
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 68061
12.8%
r 43342
 
8.1%
e 43078
 
8.1%
h 42355
 
7.9%
i 42298
 
7.9%
a 37079
 
7.0%
w 36883
 
6.9%
s 36140
 
6.8%
n 33758
 
6.3%
31068
 
5.8%
Other values (11) 119279
22.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 533341
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 68061
12.8%
r 43342
 
8.1%
e 43078
 
8.1%
h 42355
 
7.9%
i 42298
 
7.9%
a 37079
 
7.0%
w 36883
 
6.9%
s 36140
 
6.8%
n 33758
 
6.3%
31068
 
5.8%
Other values (11) 119279
22.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 533341
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 68061
12.8%
r 43342
 
8.1%
e 43078
 
8.1%
h 42355
 
7.9%
i 42298
 
7.9%
a 37079
 
7.0%
w 36883
 
6.9%
s 36140
 
6.8%
n 33758
 
6.3%
31068
 
5.8%
Other values (11) 119279
22.4%

source_type
Categorical

High correlation 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
spring
17021 
shallow well
16824 
borehole
11949 
river/lake
10377 
rainwater harvesting
2295 
Other values (2)
 
934

Length

Max length20
Median length12
Mean length9.3036027
Min length3

Characters and Unicode

Total characters552634
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowrainwater harvesting
3rd rowdam
4th rowborehole
5th rowrainwater harvesting

Common Values

ValueCountFrequency (%)
spring 17021
28.7%
shallow well 16824
28.3%
borehole 11949
20.1%
river/lake 10377
17.5%
rainwater harvesting 2295
 
3.9%
dam 656
 
1.1%
other 278
 
0.5%

Length

2025-04-18T21:30:08.057057image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:08.227326image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
spring 17021
21.7%
shallow 16824
21.4%
well 16824
21.4%
borehole 11949
15.2%
river/lake 10377
13.2%
rainwater 2295
 
2.9%
harvesting 2295
 
2.9%
dam 656
 
0.8%
other 278
 
0.4%

Most occurring characters

ValueCountFrequency (%)
l 89622
16.2%
e 66344
12.0%
r 56887
10.3%
o 41000
 
7.4%
s 36140
 
6.5%
w 35943
 
6.5%
a 34742
 
6.3%
i 31988
 
5.8%
h 31346
 
5.7%
n 21611
 
3.9%
Other values (10) 107011
19.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 552634
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 89622
16.2%
e 66344
12.0%
r 56887
10.3%
o 41000
 
7.4%
s 36140
 
6.5%
w 35943
 
6.5%
a 34742
 
6.3%
i 31988
 
5.8%
h 31346
 
5.7%
n 21611
 
3.9%
Other values (10) 107011
19.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 552634
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 89622
16.2%
e 66344
12.0%
r 56887
10.3%
o 41000
 
7.4%
s 36140
 
6.5%
w 35943
 
6.5%
a 34742
 
6.3%
i 31988
 
5.8%
h 31346
 
5.7%
n 21611
 
3.9%
Other values (10) 107011
19.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 552634
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 89622
16.2%
e 66344
12.0%
r 56887
10.3%
o 41000
 
7.4%
s 36140
 
6.5%
w 35943
 
6.5%
a 34742
 
6.3%
i 31988
 
5.8%
h 31346
 
5.7%
n 21611
 
3.9%
Other values (10) 107011
19.4%

source_class
Categorical

High correlation 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
groundwater
45794 
surface
13328 
unknown
 
278

Length

Max length11
Median length11
Mean length10.083771
Min length7

Characters and Unicode

Total characters598976
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgroundwater
2nd rowsurface
3rd rowsurface
4th rowgroundwater
5th rowsurface

Common Values

ValueCountFrequency (%)
groundwater 45794
77.1%
surface 13328
 
22.4%
unknown 278
 
0.5%

Length

2025-04-18T21:30:08.455562image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:08.569509image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
groundwater 45794
77.1%
surface 13328
 
22.4%
unknown 278
 
0.5%

Most occurring characters

ValueCountFrequency (%)
r 104916
17.5%
u 59400
9.9%
a 59122
9.9%
e 59122
9.9%
n 46628
7.8%
o 46072
7.7%
w 46072
7.7%
g 45794
7.6%
d 45794
7.6%
t 45794
7.6%
Other values (4) 40262
 
6.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 598976
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 104916
17.5%
u 59400
9.9%
a 59122
9.9%
e 59122
9.9%
n 46628
7.8%
o 46072
7.7%
w 46072
7.7%
g 45794
7.6%
d 45794
7.6%
t 45794
7.6%
Other values (4) 40262
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 598976
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 104916
17.5%
u 59400
9.9%
a 59122
9.9%
e 59122
9.9%
n 46628
7.8%
o 46072
7.7%
w 46072
7.7%
g 45794
7.6%
d 45794
7.6%
t 45794
7.6%
Other values (4) 40262
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 598976
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 104916
17.5%
u 59400
9.9%
a 59122
9.9%
e 59122
9.9%
n 46628
7.8%
o 46072
7.7%
w 46072
7.7%
g 45794
7.6%
d 45794
7.6%
t 45794
7.6%
Other values (4) 40262
 
6.7%

waterpoint_type
Categorical

High correlation 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
communal standpipe
28522 
hand pump
17488 
other
6380 
communal standpipe multiple
6103 
improved spring
 
784
Other values (2)
 
123

Length

Max length27
Median length18
Mean length14.827576
Min length3

Characters and Unicode

Total characters880758
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowcommunal standpipe
3rd rowcommunal standpipe multiple
4th rowcommunal standpipe multiple
5th rowcommunal standpipe

Common Values

ValueCountFrequency (%)
communal standpipe 28522
48.0%
hand pump 17488
29.4%
other 6380
 
10.7%
communal standpipe multiple 6103
 
10.3%
improved spring 784
 
1.3%
cattle trough 116
 
0.2%
dam 7
 
< 0.1%

Length

2025-04-18T21:30:08.728080image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:08.887277image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
communal 34625
29.2%
standpipe 34625
29.2%
hand 17488
14.8%
pump 17488
14.8%
other 6380
 
5.4%
multiple 6103
 
5.1%
improved 784
 
0.7%
spring 784
 
0.7%
cattle 116
 
0.1%
trough 116
 
0.1%

Most occurring characters

ValueCountFrequency (%)
p 111897
12.7%
m 93632
10.6%
n 87522
9.9%
a 86861
9.9%
59116
 
6.7%
u 58332
 
6.6%
d 52904
 
6.0%
e 48008
 
5.5%
t 47456
 
5.4%
l 46947
 
5.3%
Other values (8) 188083
21.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 880758
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
p 111897
12.7%
m 93632
10.6%
n 87522
9.9%
a 86861
9.9%
59116
 
6.7%
u 58332
 
6.6%
d 52904
 
6.0%
e 48008
 
5.5%
t 47456
 
5.4%
l 46947
 
5.3%
Other values (8) 188083
21.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 880758
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
p 111897
12.7%
m 93632
10.6%
n 87522
9.9%
a 86861
9.9%
59116
 
6.7%
u 58332
 
6.6%
d 52904
 
6.0%
e 48008
 
5.5%
t 47456
 
5.4%
l 46947
 
5.3%
Other values (8) 188083
21.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 880758
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
p 111897
12.7%
m 93632
10.6%
n 87522
9.9%
a 86861
9.9%
59116
 
6.7%
u 58332
 
6.6%
d 52904
 
6.0%
e 48008
 
5.5%
t 47456
 
5.4%
l 46947
 
5.3%
Other values (8) 188083
21.4%

waterpoint_type_group
Categorical

High correlation 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
communal standpipe
34625 
hand pump
17488 
other
6380 
improved spring
 
784
cattle trough
 
116

Length

Max length18
Median length18
Mean length13.902879
Min length3

Characters and Unicode

Total characters825831
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowcommunal standpipe
3rd rowcommunal standpipe
4th rowcommunal standpipe
5th rowcommunal standpipe

Common Values

ValueCountFrequency (%)
communal standpipe 34625
58.3%
hand pump 17488
29.4%
other 6380
 
10.7%
improved spring 784
 
1.3%
cattle trough 116
 
0.2%
dam 7
 
< 0.1%

Length

2025-04-18T21:30:09.091466image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-18T21:30:09.274649image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
ValueCountFrequency (%)
communal 34625
30.8%
standpipe 34625
30.8%
hand 17488
15.6%
pump 17488
15.6%
other 6380
 
5.7%
improved 784
 
0.7%
spring 784
 
0.7%
cattle 116
 
0.1%
trough 116
 
0.1%
dam 7
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
p 105794
12.8%
m 87529
10.6%
n 87522
10.6%
a 86861
10.5%
53013
 
6.4%
d 52904
 
6.4%
u 52229
 
6.3%
e 41905
 
5.1%
o 41905
 
5.1%
t 41353
 
5.0%
Other values (8) 174816
21.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 825831
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
p 105794
12.8%
m 87529
10.6%
n 87522
10.6%
a 86861
10.5%
53013
 
6.4%
d 52904
 
6.4%
u 52229
 
6.3%
e 41905
 
5.1%
o 41905
 
5.1%
t 41353
 
5.0%
Other values (8) 174816
21.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 825831
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
p 105794
12.8%
m 87529
10.6%
n 87522
10.6%
a 86861
10.5%
53013
 
6.4%
d 52904
 
6.4%
u 52229
 
6.3%
e 41905
 
5.1%
o 41905
 
5.1%
t 41353
 
5.0%
Other values (8) 174816
21.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 825831
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
p 105794
12.8%
m 87529
10.6%
n 87522
10.6%
a 86861
10.5%
53013
 
6.4%
d 52904
 
6.4%
u 52229
 
6.3%
e 41905
 
5.1%
o 41905
 
5.1%
t 41353
 
5.0%
Other values (8) 174816
21.2%

Interactions

2025-04-18T21:29:47.128163image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:33.312400image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:34.826204image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:36.539276image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:37.808369image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:39.421979image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:41.031693image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:42.424771image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:43.896066image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:45.601484image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:47.295062image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:33.524456image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:35.011643image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:36.664147image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:37.927463image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:39.540464image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:41.186459image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:42.557840image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:44.029608image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:45.754469image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:47.488550image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:33.684213image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:35.142725image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:36.784740image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:38.058233image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:39.696138image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:41.331443image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:42.683363image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:44.165094image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:45.920221image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:47.647507image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:33.805942image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:35.258366image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:36.926824image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:38.203078image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:39.797701image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:41.470993image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:42.845517image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:44.278305image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:46.093632image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:47.781047image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:33.922220image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:35.402122image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:37.058417image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:38.395554image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:40.168839image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:41.585335image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:43.112880image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:44.431046image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:46.228012image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:47.932504image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:34.085771image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:35.687618image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:37.158131image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:38.558863image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:40.327738image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:41.711318image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:43.218462image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:44.590422image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:46.378052image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:48.068899image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:34.205312image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:35.846689image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:37.284975image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:38.734952image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:40.477792image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:41.868357image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:43.323834image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:44.758166image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:46.542399image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:48.206587image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:34.314164image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:36.032066image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:37.409522image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:38.879629image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:40.625467image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:41.985166image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:43.438839image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:44.944324image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:46.664768image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:48.335328image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:34.496102image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:36.200840image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:37.553108image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:39.054966image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:40.782727image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:42.126944image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:43.617037image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:45.107598image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:46.805402image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:48.458479image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:34.687402image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:36.326768image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:37.685851image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:39.227207image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:40.914594image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:42.302240image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:43.727641image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:45.234533image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
2025-04-18T21:29:46.979697image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/

Correlations

2025-04-18T21:30:09.477287image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
amount_tshbasinconstruction_yeardistrict_codeextraction_typeextraction_type_classextraction_type_groupgps_heightidlatitudelongitudemanagementmanagement_groupnum_privatepaymentpayment_typepermitpopulationpublic_meetingquality_groupquantityquantity_groupregionregion_codescheme_managementsourcesource_classsource_typewater_qualitywaterpoint_typewaterpoint_type_group
amount_tsh1.0000.0100.408-0.0890.0140.0120.0070.342-0.005-0.2620.2090.0240.0260.0320.0160.0160.0000.3430.0130.0000.0000.0000.014-0.1500.0190.0070.0170.0100.0000.0000.000
basin0.0101.0000.5250.2280.2460.2500.2390.2800.0000.6060.6470.2220.1420.0070.2450.2450.2000.0240.1140.1390.1390.1390.7670.4750.2400.2450.1230.2550.1190.2080.196
construction_year0.4080.5251.000-0.0630.3180.2690.3140.612-0.003-0.1920.4790.2870.0810.0500.2670.2670.0820.6780.0250.1260.1290.1290.945-0.2090.2890.2820.0980.2750.1300.2320.230
district_code-0.0890.228-0.0631.0000.1150.1000.107-0.1360.000-0.1340.1310.0660.0490.0030.0890.0890.186-0.0740.1740.0670.0740.0740.3850.1190.0750.0700.0700.0830.0580.0730.069
extraction_type0.0140.2460.3180.1151.0001.0001.0000.1600.0000.1950.2260.1660.1280.0000.2440.2440.2120.0290.1370.1730.1220.1220.2610.1690.1920.4080.3090.4830.1480.5050.537
extraction_type_class0.0120.2500.2690.1001.0001.0001.0000.1680.0000.1960.1960.2000.1180.0090.2310.2310.1710.0270.1000.1600.1080.1080.3580.1390.2070.4550.2710.4400.1470.5040.536
extraction_type_group0.0070.2390.3140.1071.0001.0001.0000.1560.0000.1860.2170.1570.1210.0000.2420.2420.1970.0270.1060.1700.1160.1160.2910.1640.1710.3930.2710.4650.1450.5040.536
gps_height0.3420.2800.612-0.1360.1600.1680.1561.000-0.005-0.0870.1600.1420.0540.0430.1670.1670.1900.5470.0790.0980.1000.1000.466-0.2020.1340.1540.0870.1770.0840.1220.121
id-0.0050.000-0.0030.0000.0000.0000.000-0.0051.0000.0030.0010.0000.0000.0040.0010.0010.0000.0030.0100.0000.0000.0000.0040.0010.0000.0020.0070.0060.0000.0080.008
latitude-0.2620.606-0.192-0.1340.1950.1960.186-0.0870.0031.000-0.3620.2010.143-0.0090.2100.2100.196-0.1410.0610.1200.1390.1390.6750.1920.2300.1830.1230.1890.1130.1450.133
longitude0.2090.6470.4790.1310.2260.1960.2170.1600.001-0.3621.0000.2370.1060.1360.1990.1990.1200.3980.0640.1030.1010.1010.793-0.4580.2940.1690.0640.1400.1290.1930.192
management0.0240.2220.2870.0660.1660.2000.1570.1420.0000.2010.2371.0001.0000.0330.2260.2260.2400.0320.2920.1570.2390.2390.3430.1380.7940.2160.2010.2590.1390.1550.166
management_group0.0260.1420.0810.0490.1280.1180.1210.0540.0000.1430.1061.0001.0000.0200.1450.1450.0440.0310.2500.1380.2280.2280.2220.0840.7000.2250.1360.2190.1390.0700.067
num_private0.0320.0070.0500.0030.0000.0090.0000.0430.004-0.0090.1360.0330.0201.0000.0070.0070.0070.0330.0000.0110.0000.0000.008-0.0930.0160.0000.0000.0000.0090.0000.000
payment0.0160.2450.2670.0890.2440.2310.2420.1670.0010.2100.1990.2260.1450.0071.0001.0000.1850.0190.1440.1430.1270.1270.3570.1530.2030.2040.0980.1900.1330.1630.163
payment_type0.0160.2450.2670.0890.2440.2310.2420.1670.0010.2100.1990.2260.1450.0071.0001.0000.1850.0190.1440.1430.1270.1270.3570.1530.2030.2040.0980.1900.1330.1630.163
permit0.0000.2000.0820.1860.2120.1710.1970.1900.0000.1960.1200.2400.0440.0070.1850.1851.0000.0350.1370.1190.0560.0560.4080.1590.2930.2210.1140.2190.1340.1550.147
population0.3430.0240.678-0.0740.0290.0270.0270.5470.003-0.1410.3980.0320.0310.0330.0190.0190.0351.0000.0320.0040.0090.0090.050-0.0930.0440.0210.0000.0200.0000.0290.028
public_meeting0.0130.1140.0250.1740.1370.1000.1060.0790.0100.0610.0640.2920.2500.0000.1440.1440.1370.0321.0000.0600.1040.1040.2580.1200.2680.1090.0590.0960.0600.0940.093
quality_group0.0000.1390.1260.0670.1730.1600.1700.0980.0000.1200.1030.1570.1380.0110.1430.1430.1190.0040.0601.0000.2790.2790.2150.0790.0820.1790.1350.1741.0000.1380.131
quantity0.0000.1390.1290.0740.1220.1080.1160.1000.0000.1390.1010.2390.2280.0000.1270.1270.0560.0090.1040.2791.0001.0000.2120.0900.1480.2050.1370.1990.2800.0920.084
quantity_group0.0000.1390.1290.0740.1220.1080.1160.1000.0000.1390.1010.2390.2280.0000.1270.1270.0560.0090.1040.2791.0001.0000.2120.0900.1480.2050.1370.1990.2800.0920.084
region0.0140.7670.9450.3850.2610.3580.2910.4660.0040.6750.7930.3430.2220.0080.3570.3570.4080.0500.2580.2150.2120.2121.0000.7900.3810.3220.2200.3560.1980.2940.271
region_code-0.1500.475-0.2090.1190.1690.1390.164-0.2020.0010.192-0.4580.1380.084-0.0930.1530.1530.159-0.0930.1200.0790.0900.0900.7901.0000.1700.1260.0820.1090.0740.1270.122
scheme_management0.0190.2400.2890.0750.1920.2070.1710.1340.0000.2300.2940.7940.7000.0160.2030.2030.2930.0440.2680.0820.1480.1480.3810.1701.0000.2240.2180.2690.0850.1710.182
source0.0070.2450.2820.0700.4080.4550.3930.1540.0020.1830.1690.2160.2250.0000.2040.2040.2210.0210.1090.1790.2050.2050.3220.1260.2241.0001.0001.0000.1520.3790.385
source_class0.0170.1230.0980.0700.3090.2710.2710.0870.0070.1230.0640.2010.1360.0000.0980.0980.1140.0000.0590.1350.1370.1370.2200.0820.2181.0001.0001.0000.1350.2840.284
source_type0.0100.2550.2750.0830.4830.4400.4650.1770.0060.1890.1400.2590.2190.0000.1900.1900.2190.0200.0960.1740.1990.1990.3560.1090.2691.0001.0001.0000.1590.3670.380
water_quality0.0000.1190.1300.0580.1480.1470.1450.0840.0000.1130.1290.1390.1390.0090.1330.1330.1340.0000.0601.0000.2800.2800.1980.0740.0850.1520.1350.1591.0000.1270.132
waterpoint_type0.0000.2080.2320.0730.5050.5040.5040.1220.0080.1450.1930.1550.0700.0000.1630.1630.1550.0290.0940.1380.0920.0920.2940.1270.1710.3790.2840.3670.1271.0001.000
waterpoint_type_group0.0000.1960.2300.0690.5370.5360.5360.1210.0080.1330.1920.1660.0670.0000.1630.1630.1470.0280.0930.1310.0840.0840.2710.1220.1820.3850.2840.3800.1321.0001.000

Missing values

2025-04-18T21:29:48.795454image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
A simple visualization of nullity by column.
2025-04-18T21:29:49.637797image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-04-18T21:29:50.459198image/svg+xmlMatplotlib v3.7.5, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_group
0695726000.02011-03-14Roman1390Roman34.938093-9.856322none0Lake NyasaMnyusi BIringa115LudewaMundindi109TrueGeoData Consultants LtdVWCRomanFalse1999gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipe
187760.02013-03-06Grumeti1399GRUMETI34.698766-2.147466Zahanati0Lake VictoriaNyamaraMara202SerengetiNatta280NaNGeoData Consultants LtdOtherNaNTrue2010gravitygravitygravitywuguser-groupnever paynever paysoftgoodinsufficientinsufficientrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipe
23431025.02013-02-25Lottery Club686World vision37.460664-3.821329Kwa Mahundi0PanganiMajengoManyara214SimanjiroNgorika250TrueGeoData Consultants LtdVWCNyumba ya mungu pipe schemeTrue2009gravitygravitygravityvwcuser-grouppay per bucketper bucketsoftgoodenoughenoughdamdamsurfacecommunal standpipe multiplecommunal standpipe
3677430.02013-01-28Unicef263UNICEF38.486161-11.155298Zahanati Ya Nanyumbu0Ruvuma / Southern CoastMahakamaniMtwara9063NanyumbuNanyumbu58TrueGeoData Consultants LtdVWCNaNTrue1986submersiblesubmersiblesubmersiblevwcuser-groupnever paynever paysoftgooddrydrymachine dbhboreholegroundwatercommunal standpipe multiplecommunal standpipe
4197280.02011-07-13Action In A0Artisan31.130847-1.825359Shuleni0Lake VictoriaKyanyamisaKagera181KaragweNyakasimbi0TrueGeoData Consultants LtdNaNNaNTrue0gravitygravitygravityotherothernever paynever paysoftgoodseasonalseasonalrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipe
5994420.02011-03-13Mkinga Distric Coun0DWE39.172796-4.765587Tajiri0PanganiMoa/MweremeTanga48MkingaMoa1TrueGeoData Consultants LtdVWCZingibaliTrue2009submersiblesubmersiblesubmersiblevwcuser-grouppay per bucketper bucketsaltysaltyenoughenoughotherotherunknowncommunal standpipe multiplecommunal standpipe
6198160.02012-10-01Dwsp0DWSP33.362410-3.766365Kwa Ngomho0InternalIshinabulandiShinyanga173Shinyanga RuralSamuye0TrueGeoData Consultants LtdVWCNaNTrue0swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodenoughenoughmachine dbhboreholegroundwaterhand pumphand pump
7545510.02012-10-09Rwssp0DWE32.620617-4.226198Tushirikiane0Lake TanganyikaNyawishi CenterShinyanga173KahamaChambo0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpwuguser-groupunknownunknownmilkymilkyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pump
8539340.02012-11-03Wateraid0Water Aid32.711100-5.146712Kwa Ramadhan Musa0Lake TanganyikaImalaudukiTabora146Tabora UrbanItetemia0TrueGeoData Consultants LtdVWCNaNTrue0india mark iiindia mark iihandpumpvwcuser-groupnever paynever paysaltysaltyseasonalseasonalmachine dbhboreholegroundwaterhand pumphand pump
9461440.02011-08-03Isingiro Ho0Artisan30.626991-1.257051Kwapeto0Lake VictoriaMkonomreKagera181KaragweKaisho0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodenoughenoughshallow wellshallow wellgroundwaterhand pumphand pump
idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_group
59390136770.02011-08-04Rudep1715DWE31.370848-8.258160Kwa Mzee Atanas0Lake TanganyikaKitontoRukwa152Sumbawanga RuralMkowe150TrueGeoData Consultants LtdVWCNaNFalse1991swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientmachine dbhboreholegroundwaterhand pumphand pump
59391448850.02013-08-03Government Of Tanzania540Government38.044070-4.272218Kwa0PanganiMaore KatiKilimanjaro33SameMaore210TrueGeoData Consultants LtdWater authorityHingililiTrue1967gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipe
59392406070.02011-04-15Government Of Tanzania0Government33.009440-8.520888Benard Charles0Lake RukwaMbuyuni AMbeya121ChunyaMbuyuni0TrueGeoData Consultants LtdVWCNaNTrue0gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipe
59393483480.02012-10-27Private0Private33.866852-4.287410Kwa Peter0InternalMasangaTabora142IgungaIgunga0FalseGeoData Consultants LtdWater authorityNaNFalse0gravitygravitygravityprivate operatorcommercialpay per bucketper bucketsoftgoodinsufficientinsufficientdamdamsurfaceotherother
5939411164500.02011-03-09World Bank351ML appro37.634053-6.124830Chimeredya0Wami / RuvuKomstariMorogoro56MvomeroDiongoya89TrueGeoData Consultants LtdVWCNaNTrue2007submersiblesubmersiblesubmersiblevwcuser-grouppay monthlymonthlysoftgoodenoughenoughmachine dbhboreholegroundwatercommunal standpipecommunal standpipe
593956073910.02013-05-03Germany Republi1210CES37.169807-3.253847Area Three Namba 270PanganiKiduruniKilimanjaro35HaiMasama Magharibi125TrueGeoData Consultants LtdWater BoardLosaa Kia water supplyTrue1999gravitygravitygravitywater boarduser-grouppay per bucketper bucketsoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipe
59396272634700.02011-05-07Cefa-njombe1212Cefa35.249991-9.070629Kwa Yahona Kuvala0RufijiIgumbiloIringa114NjombeIkondo56TrueGeoData Consultants LtdVWCIkondo electrical water schTrue1996gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipe
59397370570.02011-04-11NaN0NaN34.017087-8.750434Mashine0RufijiMadunguluMbeya127MbaraliChimala0TrueGeoData Consultants LtdVWCNaNFalse0swn 80swn 80handpumpvwcuser-grouppay monthlymonthlyfluoridefluorideenoughenoughmachine dbhboreholegroundwaterhand pumphand pump
59398312820.02011-03-08Malec0Musa35.861315-6.378573Mshoro0RufijiMwinyiDodoma14ChamwinoMvumi Makulu0TrueGeoData Consultants LtdVWCNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientshallow wellshallow wellgroundwaterhand pumphand pump
59399263480.02011-03-23World Bank191World38.104048-6.747464Kwa Mzee Lugawa0Wami / RuvuKikatanyembaMorogoro52Morogoro RuralNgerengere150TrueGeoData Consultants LtdVWCNaNTrue2002nira/taniranira/tanirahandpumpvwcuser-grouppay when scheme failson failuresaltysaltyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pump